Skip to content

feat(container-loader): add captureFullContainerState free function#27220

Open
markfields wants to merge 26 commits intomicrosoft:mainfrom
markfields:full-container-state
Open

feat(container-loader): add captureFullContainerState free function#27220
markfields wants to merge 26 commits intomicrosoft:mainfrom
markfields:full-container-state

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented May 1, 2026

Adds captureFullContainerState, a @legacy @alpha free function in container-loader that
produces an IPendingContainerState JSON for an attached document without instantiating a
Loader, Container, or runtime
. It drives only IUrlResolver + IDocumentServiceFactory,
fetches the latest snapshot via getSnapshot/getSnapshotTree, drains ops via
fetchMessages(seq+1, …), and inlines blob contents and loading-group snapshots so the
artifact is fully portable.

This is the missing piece in the frozen-container series:

Together with loadFrozenContainerFromPendingState, captureFullContainerState enables
fully-offline frozen-container scenarios.

Design notes

  • GC respect is two-layered: snapshot tree walk skips unreferenced: true subtrees, and the
    attachment-blob filter parses GC tombstones / deleted-nodes with documented GC-lag
    tolerance (blobs absent from the GC graph are kept).
  • Concurrency is bounded (mapWithConcurrency: 32 blobs, 4 group snapshots).
  • Loading-group snapshots are intentionally NOT proactively captured — captureFullContainerState
    throws UsageError if any referenced subtree carries a groupId. This will be reintroduced when
    there's a known consumer and e2e harness.
  • Some snapshot/GC/blob-tree constants are duplicated from container-runtime. A
    cross-package contract test fails CI on drift.
  • Attachment-blob payloads are encoded via base64, which is different from existing inlined
    blobs since those are text-based payloads.

See also #27100, and this markdown file used during branch development,

Follow-ups

See this list of follow-ups accrued during development. These will be moved to ADO tasks once the PR is merged.

Additionally, these from Deep Review on a later iteration:

And some of my own thoughts on final review pass:

  • Do we have roundtrip test coverage of both loadExistingContainer and loadFrozenContainerFromPendingState?
  • Dedupe GC types (and code) or add contract tests to prevent drift

anthony-murphy and others added 10 commits April 20, 2026 10:48
Adds a driver-only free function that captures a container's current state
in the IPendingContainerState wire format using only an IDocumentServiceFactory
and IUrlResolver. Unlike Container.getPendingLocalState(), no runtime or
codeLoader is instantiated: the function fetches the latest snapshot, reads
the authoritative sequence number from the snapshot's attributes blob, drains
ops from delta storage from that sequence number, and serializes the result.
pendingRuntimeState is undefined, so the output is intended for state relay,
inspection, and durable-state snapshot use cases rather than rehydrating
in-flight DDS changes. The output can be fed back into loadExistingContainer
or loadFrozenContainerFromPendingState as pendingLocalState.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tainerPendingState

Extends captureContainerPendingState so the driver-level state is fully
self-contained for blob reads as well. Attachment blob bytes are fetched and
added to snapshotBlobs keyed by storage ID, which ContainerStorageAdapter
already serves through its cache — no wire-format change required.

GC state is consulted when present: blobs GC has explicitly marked
unreferencedTimestampMs, tombstoned, or deleted are skipped. Blobs absent
from the GC graph are kept, since GC lag can leave recently-attached blobs
off the graph and dropping them would lose live data. When the snapshot has
no GC tree (GC disabled or pre-GC document), every attachment blob from the
BlobManager redirect table is included.

The relevant blob manager / GC constants and the minimal parsing logic are
duplicated locally to avoid a loader → runtime dependency; comments point
back to the canonical definitions in container-runtime and
runtime-definitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ware

Extends the driver-only capture to cover the whole referenced graph of the
container, not just attachment blobs.

- Honours ISnapshotTree.unreferenced: a shared tree walker skips any subtree
  flagged unreferenced by the summarizer (which sets the flag from GC state)
  and inlines contents of every other blob it reaches. Replaces the
  unfiltered getBlobContentsFromTree path.
- Pre-fetches loading-group snapshots: enumerates groupIds on the base
  snapshot (skipping unreferenced subtrees), fetches each via
  IDocumentStorageService.getSnapshot({ versionId, loadingGroupIds }), runs
  the fetched snapshot through the same tree walker, and serialises the
  result into IPendingContainerState.loadedGroupIdSnapshots. If the driver
  lacks getSnapshot support or no groupIds are declared, no groups are
  included.
- GC parsing is done once and shared between tree-level and attachment-blob
  filtering. captureReferencedAttachmentBlobs now takes pre-parsed GC data.

Renames captureAttachmentBlobs.ts to captureReferencedContents.ts to reflect
the broader scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the reachability filtering and groupId-fetch paths that the local
end-to-end tests can't easily exercise (no summarizer runs in the local
test server, and TestFluidObjectFactory doesn't produce loading-group
datastores out of the box). Tests construct ISnapshotTree fixtures
directly and back readBlob/getSnapshot with an in-memory shim.

15 cases across readReferencedSnapshotBlobs, parseGcSnapshotData,
captureReferencedAttachmentBlobs, and captureGroupIdSnapshots — covering
unreferenced subtree skip, root .blobs special-casing, ISnapshot vs
ISnapshotTree input, GC lag tolerance (blobs absent from gcNodes are
kept), tombstone/deletedNodes skip, and groupId enumeration/dedup/fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptureFullContainerState

The function captures the whole referenced graph of the container (snapshot,
loading-group snapshots, inlined structural blobs, inlined attachment blobs,
trailing ops) — not just "pending state." The new name matches the scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.spec

Collapse a multi-line chained .get() call onto a single line to satisfy
biome's formatter — CI was failing on `biome check .` in local-server-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inerState

Four issues flagged by Copilot review on PR microsoft#27100:

1. captureFullContainerState created an IDocumentService but never called
   dispose(). Wrap the capture body in try/finally and dispose in the
   finally to release driver-held resources (sockets/caches).

2. readReferencedSnapshotBlobs fanned every blob read at every tree level
   into a single Promise.all, giving unbounded concurrency on large
   snapshots. Refactor into a collect-then-fetch pipeline: walk the tree
   synchronously to gather referenced blob ids, then fetch via a new
   mapWithConcurrency helper capped at 32 in-flight reads.

3. captureReferencedAttachmentBlobs had the same unbounded-parallel issue
   over all referenced attachment storage ids. Route through the same
   mapWithConcurrency helper.

4. collectUnreferencedBlobLocalIds returned undefined when gcData.gcState
   was undefined, silently dropping tombstones/deletedNodes filtering even
   when those lists were populated. Contradicted the function docs. Now
   always applies tombstones/deletedNodes regardless of gcState presence,
   and returns a (possibly empty) Set rather than undefined. Added a unit
   test covering the gcState-undefined-but-tombstones-present case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 8a748cc: captureGroupIdSnapshots still fanned every
getSnapshot call into a single Promise.all. Route through
mapWithConcurrency with a lower limit (4) since each call pulls a whole
snapshot tree, not a single blob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip cache

Two issues from PR review:

1. Bind getSnapshot when extracting it from the storage service. Real
   driver implementations reference `this` (e.g.,
   LocalDocumentStorageService.getSnapshot reads this.id), so calling
   the detached method would TypeError in strict mode. Mirrors the
   bind pattern in protocolTreeDocumentStorageService.ts:31. Added a
   class-based unit test stub whose getSnapshot touches `this` — would
   have caught this.

2. Pass cacheSnapshot: false on every getSnapshot call we make from the
   capture path. This capture is transient; we don't want to pollute the
   driver's snapshot cache with it. Covered by a unit test asserting the
   option is forwarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (1681 lines, 14 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
markfields added 7 commits May 5, 2026 20:33
- Wire-format consts POJO + contract test
- GC-interesting test
- Monitoring context wired (to be reverted)
- API report regenerated
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
@markfields markfields marked this pull request as ready for review May 6, 2026 00:01
Copilot AI review requested due to automatic review settings May 6, 2026 00:01
@markfields markfields requested a review from a team as a code owner May 6, 2026 00:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new driver-only capture API to @fluidframework/container-loader that can produce a portable IPendingContainerState JSON for an attached document without creating a Loader/Container/Runtime, enabling fully-offline frozen-container rehydration scenarios.

Changes:

  • Adds captureFullContainerState (@legacy @alpha) to capture the latest snapshot + post-snapshot ops, inline referenced snapshot blobs, and inline referenced attachment blob bytes (base64) for portability.
  • Extends the pending-state wire format with attachmentBlobContents (base64) and wires decoding/deduping through load (SerializedStateManager) and storage (PendingLocalStateStore).
  • Adds unit + local-server integration coverage, plus a contract test to detect drift in duplicated wire-format constants.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/loader/container-loader/src/createAndLoadContainerUtils.ts Implements captureFullContainerState and its props interface; snapshot/op capture and assembly of IPendingContainerState.
packages/loader/container-loader/src/captureReferencedContents.ts New GC-aware snapshot walker + attachment-blob capture helpers and exported wireFormatConstants.
packages/loader/container-loader/src/serializedStateManager.ts Adds attachmentBlobContents to IPendingContainerState and decodes/merges it into the blob cache on load.
packages/loader/container-loader/src/pendingLocalStateStore.ts Dedupes the new attachmentBlobContents map across stored pending states.
packages/loader/container-loader/src/containerStorageAdapter.ts Introduces IBase64BlobContents type to make the base64-vs-utf8 encoding contract explicit.
packages/loader/container-loader/src/index.ts Exports captureFullContainerState, ICaptureFullContainerStateProps, and wireFormatConstants (internal).
packages/loader/container-loader/src/test/captureReferencedContents.spec.ts Unit tests for GC parsing/filtering, referenced-blob walking, attachment-blob behavior, and loading-group detection.
packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts Local-server integration tests for capture → frozen rehydrate, ops after snapshot, nested DDS handles, and binary attachment blobs.
packages/test/local-server-tests/src/test/wireFormatConstants.spec.ts Contract test ensuring loader-duplicated wire-format constants match runtime/runtime-definitions sources.
packages/runtime/container-runtime/src/index.ts Re-exports internal blob-manager wire-format constants for the contract test.
packages/runtime/container-runtime/src/blobManager/blobManager.ts Marks blobManagerBasePath as @internal for extraction/export hygiene.
packages/runtime/container-runtime/src/blobManager/blobManagerSnapSum.ts Marks redirectTableBlobName as @internal for extraction/export hygiene.
packages/loader/container-loader/api-report/container-loader.legacy.alpha.api.md API report update for the new @legacy @alpha export and props interface.
.changeset/wide-foxes-behave.md Changeset for the new API and related internal exports.
full-container-state-review-notes.md Adds detailed review notes / design and coverage tracking document.

Comment thread packages/loader/container-loader/src/createAndLoadContainerUtils.ts Outdated
Comment thread packages/loader/container-loader/src/createAndLoadContainerUtils.ts
Comment thread packages/test/local-server-tests/src/test/captureFullContainerState.spec.ts Outdated
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts Outdated
Comment thread packages/loader/container-loader/src/createAndLoadContainerUtils.ts
Comment thread packages/loader/container-loader/src/serializedStateManager.ts
Comment thread packages/loader/container-loader/src/captureReferencedContents.ts
@markfields markfields requested review from dannimad and vladsud May 7, 2026 21:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  288641 links
    1922 destination URLs
    2172 URLs ignored
       0 warnings
       0 errors


@anthony-murphy
Copy link
Copy Markdown
Contributor

Deep Review

Reviewed commit c21964d on 2026-05-07.

Readiness: 7/10 — 🟢 ALMOST READY

Not ready for sign-off, but close. The prior round's two open Tier 2s are resolved on the code axis: the wire-format extension is now called out in wide-foxes-behave.md, and the revived-unreferenced self-containment hole is acknowledged with a follow-up commitment from the author. What's left is a cluster of polish/process items — explicit gatekeeper acks from anthony-murphy on the wire-format extension, UTF-8/base64 encoding split, and loader-side constant duplication; an AbortSignal API affordance before @alpha graduation; an op-stream contiguity assert; a small decoder helper; and tracking the revived-unreferenced follow-up before the API graduates.

Path to Ready

  • Resolve inline threads
  • Add anthony-murphy as a requested reviewer (or @-mention on serializedStateManager.ts, captureReferencedContents.ts, and wide-foxes-behave.md) so the wire-format extension, UTF-8/base64 encoding split, and loader-side constant duplication are explicitly acked by the area's gatekeeper (closes the process side of three threads as one batch)
  • File the revived-unreferenced-subtree follow-up task (ADO item) and link it from captureFullContainerState's JSDoc as a known self-containment caveat until the follow-up lands; consider the conservative UsageError interim if the follow-up will not land before @alpha graduation

Context for Reviewers

For human reviewer
  • anthony-murphy — Wire-format direction: optional field with JSDoc compat note (current) vs versioned envelope vs another shape; whether the IPendingContainerState literal should be constructed in captureFullContainerState directly or via a helper in serializedStateManager for shape ownership; whether the seven duplicated runtime constants belong in driver-definitions per AB#72934. Confirms or retires the "one format across layers" stance from PR Removal of omitted flag from snapshot as service will not supply it. #20507 in light of the binary-safety rationale.
  • dannimad — Procedurally required as consumer-side owner via Expose creation of a frozen document service factory. #25653 (createFrozenDocumentServiceFactory); confirm the captured artifact shape meets rehydrate-side expectations.
  • jatgarg — Procedural sign-off on the mixinMonitoringContext / configProvider divergence in createAndLoadContainerUtils.ts (the convention enforced on the sibling loadSummarizerContainerAndMakeSummary in PR On demand summarizer for on demand summaries #25394). Author's docblock acknowledges the divergence; confirm or require an optional configProvider? prop now for forward-compat.
  • ChumpChief — Alpha-surface scaling: the "how does this scale?" question from PR ContainerLoader: Move getPendingLocalState to legacy/alpha #25513 remains open across four alpha additions; resolve the convention (overload of asLegacyAlpha vs standalone free functions).
  • tyler-cai-microsoft — Confirm the loading-groups UsageError is acceptable scope and follow-up tracking captures the parity gap with PR [Data Virtualization & Offline]: Reimplement groupId offline with the loader #20565.
  • agarwal-navin — Confirm the GC-filter / tombstone-recovery tradeoff for the captured artifact (tombstoned blobs foreclosed by the artifact, different from the live runtime's allowTombstone recoverability), and weigh in on the revived-unreferenced deferral.
  • Cannot be assessed by the pipeline:
    • Real-world memory/throughput on long-lived documents with millions of trailing ops.
    • Whether mapWithConcurrency limits (32 blobs / 4 group snapshots) are correct under production driver-throttling regimes (notably ODSP).
    • Whether the captured artifact loaded by loadFrozenContainerFromPendingState reproduces identical runtime behavior across all document shapes; the local-server test exercises one round-trip only.
Review history (5 prior reviews)
  • 6e66f7b 2026-05-07 · 4/10 — post-snapshot blobAttach gap resolved; new revived-unreferenced self-containment hole and wire-format-extension JSDoc still open
  • 1a4bcfa 2026-05-06 · 4/10 — UTF-8 wire-format defect resolved; post-snapshot blobAttach gap and format-extension JSDoc surface
  • cdda9ba 2026-05-05 · 4/10 — UTF-8 wire-format defect resolved; post-snapshot blobAttach gap and format-extension JSDoc surface
  • 1d5a51b 2026-05-05 · 3/10 — UTF-8 attachment-blob round-trip flagged Tier 1
  • 57a1a76 2026-05-04 · 5/10 — GC-redirect-identity and unbounded-savedOps flagged (both later reclassified)

return { tree: snapshot, read: async (id) => storage.readBlob(id) };
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: UTF-8/base64 encoding split — process sign-off ask, no code change.

captureReferencedAttachmentBlobs writes base64 (bufferToString(buffer, "base64")) into a new IBase64BlobContents; readReferencedSnapshotBlobs writes UTF-8 into ISerializableBlobContents (serializedStateManager.ts:284-299, decode in containerStorageAdapter.ts:515-527). The split is technically correct and aligns with the runtime's existing pending-blob serializer (blobManager.ts:340-343, :951-960); the non-UTF-8 round-trip e2e test at captureFullContainerState.spec.ts:351-427 covers the binary case.

Process concern: vladsud's "we should stick to one format across layers for blobs" position from PR #20507 (with anthony-murphy explicitly looped in) hasn't been revisited on the record. vladsud is already requested; please also tag anthony-murphy so the area's "one format" stance is consciously superseded rather than silently bypassed. The binary-safety argument for the split is sound — no code change expected.

* Optional logger for driver-side telemetry.
*/
readonly logger?: ITelemetryBaseLogger | undefined;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Add abortSignal?: AbortSignal to ICaptureFullContainerStateProps before @alpha graduation.

ICaptureFullContainerStateProps exposes only urlResolver, documentServiceFactory, request, logger. The capture body concurrently reads snapshot/attachment blobs (bounded concurrency 32) and drains the op stream without a cancellation check. IDocumentDeltaStorageService.fetchMessages already accepts abortSignal?: AbortSignal (packages/common/driver-definitions/src/storage.ts:103-109), so the underlying primitives support it.

Suggested fix: add abortSignal?: AbortSignal to ICaptureFullContainerStateProps, propagate it into the mapWithConcurrency blob/group loops and the fetchMessages op-drain loop, and call signal.throwIfAborted() between bounded batches. Additive and cheaper to land before the API graduates from @alpha.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, adding it to follow-ups mentioned in PR description

attachmentBlobContents,
);
Object.assign(attachmentBlobContents, added);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Op-stream drain trusts the driver silently — assert contiguity to fail closed.

captureFullContainerState drains connectToDeltaStorage().fetchMessages(attributes.sequenceNumber + 1, …) into savedOps and emits whatever the driver yields. Other parts of the same function fail-closed on suspicious states (UsageError on loading groups; GenericError on empty snapshot fetch); ops are the one place that trusts the driver silently. A buggy driver / network truncation would produce an artifact whose ops cannot be replayed, surfacing as a confusing rehydration-time failure pointing at the loader rather than at the capture.

Suggested fix: track a previous-sequence-number across the drain loop and assert each op's sequenceNumber === prev + 1; throw on a gap with the offending pair in the message. Optional: also assert the snapshot's latestSequenceNumber >= attributes.sequenceNumber at fetch time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good point. A downside to not using the DeltaManager code in the container loader, I guess. Tracking as a follow-up.

gcBlobPrefix: "__gc",
gcTombstoneBlobKey: "__tombstones",
gcDeletedBlobKey: "__deletedNodes",
} as const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Loader↔runtime constant duplication — gatekeeper nod ask, no code change.

This PR introduces a wireFormatConstants block in the loader mirroring seven runtime-side constants (.blobs, .redirectTable, _blobs, gc, __gc, __tombstones, __deletedNodes) and a cross-package deepStrictEqual contract test (wireFormatConstants.spec.ts). PR #21729 (anthony-murphy, 2024-07-02) was the deliberate refactor that produced these constants in their current form, with markfields reviewing for path-format discipline. This PR is the first cross-layer consumer; the chosen pattern (duplicate + contract test) sets precedent for any future shared wire-format constant.

Tag anthony-murphy for an explicit nod on the duplicate+contract-test layering choice, or merge with a clear note that AB#72934 will revisit (per thread #3192306888). No code change expected.

Comment on lines +334 to +336
urlResolver,
documentServiceFactory,
request,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthony-murphy thinking about replacing these 3 params with just documentService and having caller piece that together (maybe in follow-up PR). Does that work? I don't totally understand the environment in which this will be called and where these params will come from

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its tricky. overall i would love to move in a direction where fluid takes a document service, but it could be hard right now due lack of usage that way. there won't be any app examples, and i'm not sure the odsp driver supports it. i would be 100% down to support both maybe via overload or union to start moving in that direction, as we need to start somewhere

@markfields markfields enabled auto-merge (squash) May 8, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants